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■ Abstract 

In this article, we propose a new construction of probabilistic collusion-secure fingerprint codes against 
up to three pirates and give a theoretical security evaluation. Our pirate tracing algorithm combines a 
scoring method analogous to Tardos codes (J. ACM, 2008) with an extension of parent search techniques 
ly-^ ■ of some preceding 2-secure codes. Numerical examples show that our code lengths are significantly shorter 

than (about 30% to 40% of) the shortest known c-secure codes by Nuida et al. (Des. Codes Cryptogr., 
2009) with c = 3. Some preliminary proposal for improving efficiency of our tracing algorithm is also 
given. 



1 Introduction 

1.1 Background and Related Works 



Recently, digital content distribution services have been widespread by virtue of progress of information 
technology. Digitization of content distribution has improved convenience for ordinary people. However, 
the digitization also enables malicious persons to perform more powerful attacks, and the amount of illegal 
fT) • content redistribution is increasing very rapidly. Hence technical countermeasures for such illegal activities 

are strongly desired. A use of fingerprint code is a possible solution for such problems, which aims at giving 
. traceability of the attacker (pirate) when an illegally redistributed digital content is found, thus letting the 

potential attackers abandon to perform actual attacks. 

In the context of fingerprint codes, each copy of a content is divided into several segments (common 
to all copies), in each of which a bit of an encoded user ID is embedded by the content provider by using 
watermarking technique. The embedded encoded ID (fingerprint) provides traceability of an adversarial 
user (pirate) when an unauthorized copy of the content is distributed. Such a scheme aims at tracing some 
pirate, without falsely tracing any innocent user, from the fingerprint embedded in the pirated content 
with an overwhelming probability. It has been noticed that a coalition of pirates can perform certain strong 
attacks (collusion attacks) to the fingerprint, therefore any effective fingerprint code should be secure against 
collusion attacks, called collusion-secure codes. In particular, if the code is secure against collusion attacks 
by up to c pirates, then the code is called c-secure [2]. 

Several constructions of collusion-secure codes have been proposed so far. Among them, the one proposed 
by Tardos [M] is "asymptotically optimal" , in the sense that the order of his code length with respect to 
the allowable number c of pirates is theoretically the lowest (which is quadratic in c). For improvements of 
Tardos codes, the constant factor of the asymptotic code length has been reduced by c-secure codes given 
by Nuida et al. [10] to approximately 5.35% of Tardos codes, which is the smallest value so far provable 
without any additional assumption. On the other hand, after the first proposal of Tardos codes there were 
proposed several collusion-secure codes, e.g., [TJ 02 E] E] [IT], which restrict the number of pirates to c = 2 
but achieve further short code lengths. Such constructions of short c-secure codes for a small c would have 
not only theoretical but also practical importance; for example, when the users are less anonymous for the 
content provider (e.g., the case of secret documents distributed in a company), it seems infeasible to make a 
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large coalition confidentially. The aim of this article is to extend such a "compact" construction to the next 
case c = 3. 

For related works, we notice that there is an earlier work by Sebe and Domingo-Ferrer |f 3] for 3-secure 
codes. On the other hand, there is another work by Kitagawa et al. [5] on construction of 3-secure codes, in 
which very short code lengths are proposed but its security is evaluated only by computer experiments for 
some special attack strategies. 

1.2 Our Contribution 

In this article, we propose a new construction of 3-secure codes and give a theoretical security evaluation. The 
codeword generation algorithm is just a bit-wise random sampling, which has been used by many preceding 
constructions as well. The novel point of our construction is in the pirate tracing algorithm, which combines 
the use of score computation analogous to Tardos codes [T3] with an extension of "parent search" technique 
of some preceding works against two pirates [TJ [SJ 111) . Intuitively, the score computation method works 
well when the parts of fingerprint in the pirated content are chosen evenly from the codewords of pirates, 
while the extended "parent search" technique works well when the fingerprint is not evenly chosen from the 
codewords of pirates, therefore their combination is effective. 

In comparison under some parameter choices, our code lengths are approximately 3% to 4% of 3-secure 
codes by Sebe and Domingo-Ferrer [T3], and approximately 30% to 40% of c-secure codes by Nuida et al. 
[10] for c = 3. This shows that our code length is even significantly shorter than the shortest known c-secure 
codes [TO] , 

In fact, Kitagawa et al. [5] claimed that their 3-secure code provides almost the same security level as our 
code for the case of 100 users and 128-bit length. However, they evaluated the security by only computer 
experiments for the case of some special attack algorithms (and they studied just one parameter choice as 
above) , while in this article we give a theoretical security evaluation for arbitrary attack algorithms under the 
standard Marking Assumption (cf., [2]). (One may think that the perfect protection of so-called undetectable 
positions required by Marking Assumption is not practical. However, this is in fact not a serious problem, 
as a general conversion technique recently proposed by Nuida [7] can supply robustness against erasure of a 
bounded number of undetectable bits.) 

Moreover, for the sake of improving efficiency of our tracing algorithm, we also discuss an implementation 
method for the algorithm. By an intuitive observation, it seems indeed more efficient for an average case 
than the naive implementation. A detailed evaluation of the proposed implementation method will be a 
future research topic. 

1.3 Notations 

In this article, log denotes the natural logarithm. We put [n] = {1,2, ... ,n} for an integer n. Unless some 
ambiguity emerges, we often abbreviate a set ii, . . . , ik} to • • ■ ik- Let 5 a .b denote Kronecker delta, 
i.e., we have 8 a j, = 1 if a = b and 5 a j, — if a =/= b. For a family T of sets, let (J T and P| T denote the union 
and the intersection, respectively, of all members of J- ' . 

1.4 Organization of the Article 

In Sect. [2j we give a formal definition of the notion of collusion-secure fingerprint codes. In Sect.[3j we describe 
our codeword generation algorithm and pirate tracing algorithm, state the main results on the security of our 
3-secure codes, and give some numerical examples for comparison to preceding works. Section [4] summarizes 
the outline of the security proof. In Sect. [5J we discuss an implementation issue of our tracing algorithm. 
Finally, Sect. [6] supplies the detail of our security proof omitted in Sect. |4j 
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2 Collusion-Secure Fingerprint Codes 



In this section, we introduce formal definitions for fingerprint codes. Let N and m be positive integers, and 
I < c < TV an integer parameter. Put U — [N]. Fix a symbol '?' different from '0' and '1'. We start with 
the following definition: 

Definition 1. Given the parameters N, m and c, we define the following game, which we refer to as pirate 
tracing game. The players of the game is a provider and pirates, and the game is proceeded as follows: 

1. Provider generates an N x m binary matrix W = (ifi,j)ie[JV],je[rol an< ^ an element st called state 
information. 

2. Pirates generate Up C U, 1 < \Up\ < c, without knowing W and st. 

3. Pirates receive the codeword uii — (u>i,i, ■ • ■ , Wi,m) for every i G Up. 

4- Pirates generate a word y = (j/i, . . . , y m ) on {0,1,?} under a certain restriction specified below, and 
send y to provider. 

5. Provider generates Acc C U from y, W, and st, without knowing Up. 

6. Then pirates win if Acc n Up = or Acc % Up, and otherwise provider wins. 

We call the word y in Step 4 an attack word and call '?' an erasure symbol. Put Ui = U \ Up. In the 
definition, U signifies the set of all users, Up is the coalition of pirates, and U\ is the set of innocent users. 
The codeword Wi signifies the fingerprint for user i, and the word y signifies the fingerprint embedded in 
the pirated content. The set Acc consists of the users traced by the provider from the pirated content. The 
events Acc n Up = and Acc % Up specified in Step 6 are referred to as false-negative and false-positive (or 
false-alarm), respectively. Both of false- negative and false-positive are called tracing error. 

Let Gen, Reg, p, and Tr denote the algorithms used in Steps 1, 2, 4, and 5, respectively. We call Gen, 
Reg, p, and Tr codeword generation algorithm, registration algorithm, pirate strategy, and tracing algorithm, 
respectively. We refer to the pair C = (Gen,Tr) as a fingerprint code, and the following quantity 

Pr[(W,st) <- Gen(); U P <- Reg(); y <- p(Up,( Wi ) iEUp ); 
Acc <- Tr(y, W, st) : Acc n Up = or Acc % Up] 

(i.e., the overall probability that pirates win) is called an error probability of C. 

We specify the restriction for y mentioned in Step 4. First we present some terminology. For j G [m], j-th 
column in codewords is called undetectable if j-th bits Wij of the codewords Wi of pirates i £ Up coincide with 
each other; otherwise the column is called detectable. Then, in this article, we put the following standard 
assumption called Marking Assumption [2]: 

Definition 2. The Marking Assumption states the following: For the attack word y, for every undetectable 
column j, we have yj — Wij for some (or equivalently, all) i 6 Up. 

We say that a fingerprint code C is collusion-secure if the error probability of C is sufficiently small for 
any Reg and p under Marking Assumption. More precisely, we say that C is c-secure (with e- error) [3J if the 
error probability is not higher than a sufficiently small value e under Marking Assumption. 

3 Our 3-Secure Codes 

Here we propose a codeword generation algorithm Gen and a tracing algorithm Tr for 3-secure codes (c = 3). 
The security property will be discussed below. 

The algorithm Gen, with parameter 1/2 < p < 1, is the codeword generation algorithm of Tardos codes 
[14] but the probability distribution of biases is different: For each (say, j-th) column, each user's bit Wi.j 
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is independently chosen by Pr[wij = 1] = pj, where pj = p or 1 — p with probability 1/2 each. Then Gen 
outputs W = (m,j)ie[N,j G [m] and st = (pj)ieM- 

To describe the algorithm Tr, we introduce some notations. For binary words toW, . . . , of length m, 
we define 

■ • , ™ W ) = {?/ G {0, 1}'" | % G {wf \ . . . , W f >} for every j e [ro]} , (2) 

the envelope ofw^ , . . . ,w^ k ' . Then for a binary word y of length m and a collection W — (wij ) of codewords 
of users, we define 

T{y) = {ii«2«3 QU\ii^i2^i 3 ^ii,ye £(w il ,Wi 2 ,w i3 )} (3) 

(see Sect. 11.31 for the notation 111213) ■ A key property implied by Marking Assumption is that if the attack 
word y contains no erasure symbols, then y belongs to the envelope of the codewords of pirates and, if 
furthermore \Up\ —3, the family T(y) contains the set of three pirates. By using these notations, we define 
the algorithm Tr as follows, where the words y, w%, . . . , lU/v and the state information st = (pj)ie[rol arc 
given: 

1. Replace each erasure symbol '?' in y with '0' or '1' independently in the following manner. If yj = ?, 
then it is replaced with T' with probability pj, and with '0' with probability 1 — Pj. Let y' denote the 
resulting word. 

2. Calculate a threshold parameter Z = Z y/ as specified below. 

3. For each i £ U, calculate the score S(i) of i by 

S(i) = S*,,^ log -J- + ]T 6 WiJtV , log . (4) 

je[m] Pl je[m] p ° 

y'i=l y' 3 =o 

4. If S(i) > Z for some i G U, then output every i £ U such that S(i) > Z, and halt. 

5. Calculate V = {T £ T{y') \ T n V ^ for every T £ T(y')}. If V = 0, then output nobody, and 
halt. 

6. If P| T' ^ 0, then output every member of |~| T', and halt. 

7. Calculate V = {P = Ma C U | i x £ h, P D T 7^ for every T e T'}. Let P fe be the set of all i £ U 
such that |{P £ V I i G P}\ = k. 

8. If V\ 7^ 0, then output every i £ U such that ii' £ V for some i' £ Vi, and halt. 

9. If \P\ = 7, then output every i £ U such that ii' £ V for some i' £ V2, and halt. 

10. If \V\ = 6, then output every i £ V3, and halt. 

11. If \V\ = 5 and T" = {Ma*3 G T' \ iii%, 22*3) M3 € T 3 } 7^ 0, then output every member of 7-2 H (U^~")j 
and halt. 

12. If \V\ = 5 and T" = 0, then output every i £ \JV such that ii' g V for some %' £ \JV, and halt. 

13. If \V\ = 4, then output every i £ [j V such that T £ V and T C (J V imply i e T, and halt. 

14. If \V\ = 3, then output every i £ {JV, and halt. 

15. Output nobody, and halt. 
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This algorithm is divided into two parts; Steps HHH and the remaining steps. The former part aims at per- 
forming coarse tracing to defy "unbalanced" pirate strategies; namely, if some pirates' codewords contribute 
to generate y at too many columns than the other pirates, then it is very likely that scores of such pirates 
exceed the threshold and they are correctly accused by Step |U 

On the other hand, the latter complicated part aims at performing more refined tracing. First, the 
algorithm enumerates the collections of three users such that y 1 can be made (under Marking Assumption) 
from their codewords, in other words, the collection is a candidate of the actual triple of pirates. Steps [5] 
and [5] are designed according to an intuition that a pirate would be very likely to be contained in much more 
candidate triples than an innocent user. When the tracing algorithm did not halt until Step[51 the possibilities 
of "structures" of the set T 1 are mostly limited, even allowing us to enumerate all the possibilities. However, 
it is space-consuming to enumerate them and determine suitable outputs in a case-by-case manner. Instead, 
we give an explicit algorithm (Steps [71- I15P to determine a suitable output, which is artificial but not too 
space-consuming. Some examples of the possibilities of T' are given in Fig.[T] where 1, 2, 3 are the pirates, 
ij are innocent users and the members of T' are denoted by triangles. 




V = {12,13,23,3^,2^} 
V\ = {h, 12} 
output = 2,3 




V = 

{13, ltx, 1*2, 23, 2ii,2*2, 3i x j- 

VI = $,V2 = {« 2 } I 
output = 1,2 




V = {13,l*i,23,2*i,3*'i} 
V x = 0, T" = 
\JV = {1,2,3, *i}, 12^V 
output = 1,2 



Figure 1 : Examples of the sets T' and V 

For the latter part, the tracing tends to fail in the case that the set T(y') contains much more members 
other than the triple of the pirates, which tends to occur when the contributions of the pirates' codewords 
to y was too unbalanced. However, such an unbalanced attack is defied by the former part, therefore the 
latter part also works well. More precisely, an upper bound of the error probability at the latter part will be 
derived by using the property that scores of pirates are lower than the threshold (as otherwise the tracing 
halts at the former part); cf., Sect. 16.51 Our scoring function (U), which is different from the ones for Tardos 
codes il4 and its symmetrized version [12], is adopted to simplify the derivation process. Although it is 
possible that the true error probability is reduced by applying the preceding scoring functions, a proof of a 
bound of error probability with those scoring functions requires another evaluation technique and would be 
much more involved, which is a future research topic. 

Note that, for the case p = 1/2, it is known that the "minority vote" by three pirates for generating y 
cancels the mutual information between y and a single codeword, therefore the pirates are likely to escape 
from the former part of Tr. However, even by such a strategy the pirates are unlikely to escape from the 
latter part of Tr, as collections of users rather than individual users are considered there. 

The threshold parameter Z = Z y i in Step [2] is determined as follows. Let An be the set of column indices 
j such that {Pj,y'j) = (p, 1) or (1— p, 0), i.e., the occurrence probability of the bit y'j £ {0, 1} at j-th column is 
p > 1/2, and let A\, = [m] \ Ah- Put an = |^h| and ol = \A^\. Choose a parameter e > which is smaller 
than the desired bound e of error probability. Then choose Z = Z y i satisfying the following condition: 

£ {^y^ii- p) ^{^y^i- pT ^ <f£, ( 5 ) 



5 



where the sum runs over all integers /chj^l > such that kn^og^ + k-^ log > Z. An example of a 
concrete choice of Z satisfying the condition ([5]) is as follows: 



Z = anp log — + a L (1 - p) log ■ 



P 1 -P A 




(6) 



(see Sect. I6.ll for the proof). From now, we suppose that the threshold Z satisfies the condition ([5]) and 
Z < Z . 

For the security of the proposed fingerprint code, first we present the following result, which will be 
proven in Sect. 2) 

Theorem 1. By the above choice of Eq and Z, if the number of pirates is three, then the error probability 
of the proposed fingerprint code is lower than 



£o + 

where we put 



(^3 3 )/iW m + 3(JV- 3)(JV - 4)/ 2 (p)™ + (JV - 3)(1 -p)- 3 ^«/3(p)™ , (7) 
hip) = l-3p 2 + 10p 3 - lbp 4 + I2p 5 - 4p 6 , 

hip) =p 2 (l- p) 2 (VP + V^P) + 1-P-P 2 +^- 2p 4 , (8) 
hip) = P 4 ~ 3P (P 2 - 3p + 3) + (1 - P ) 3p+1 (p 2 + p + 1) . 

Some numerical analysis suggests that the choice p — 1/2 would be optimal (or at least pretty good) to 
decrease the bound of error probabilities specified in Theorem [TJ In fact, an elementary analysis shows that 
the second term ( JV ^" 3 )/i(p)™ 1 in the sum, which seems dominant (cf., Theorem[5]below), takes the minimum 
over p £ [1/2, 1) at p = 1/2. Hence we use p = 1/2 in the following argument. Now it is shown that the error 
probability against less than three pirates also has the same bound under a condition (|10[) below (which 
seems trivial in practical situations), therefore we have the following (which will be proven in Sect.|4]): 

Theorem 2. By using the value p — 1/2, the proposed fingerprint code is 3-secure with error probability 
lower than 



£o 



provided 



Ar 3 " 3 )({) m + 3(7V-3)(7V-4)[H±^ N ) +(iV-3)8V(W 2 )io g Weo) fW\ (9) 



N ( 1 

m>81og— 1+ ) . (10) 



e V 161og(iV/ eo ; 



Note that when p = 1/2, the score S(i) of a user i is equal to log 2 times the number of columns in which 
the words Wi and y' coincide. Hence the calculation of scores can be made easier by using the "normalized" 
score Sii) — S'(i)/log2 instead, which is equal to m minus the Hamming distance of Wi from y', together 
with the "normalized" threshold Zo/log2 = m/2 + J (to/2) log(AT/eo)- 

Table [T] shows comparison of our code lengths (numerically calculated by using Theorem [2]) with 3-secure 
codes by Sebe and Domingo- Ferrer jl3| . Table [2] shows the comparison with c-secure codes by Nuida et al. 
[TO] for c = 3. The values of N and e and the corresponding code lengths are chosen from those articles. 
The tables show that our code lengths are much shorter than the codes in |13) , and even significantly shorter 
than the codes in [10 which are in fact the shortest c-secure codes known so far (improving the celebrated 
Tardos codes [2]). On the other hand, recently Kitagawa et al. [S] proposed another construction of 3-secure 
codes, and evaluated the security against some typical pirate strategies in the case N = 100 and m = 128 
by computer experiment. The resulting error probability was e = 0.009. For the same error probability, our 
code length (with parameter £o = e/2) is to — 135. Therefore our code, which is provably secure in contrast 
to their code, has almost the same length as their code. 
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Table 1: Comparison of code lengths with the codes by Sebe and Domingo- Ferrer [T3] 



N 


128 


256 


512 


e 


0.14 x 10" 6 


0.15 x 10" 13 


0.19 x 10~ 27 


m 


6985 


14025 


28105 


Our code 


282 


502 


934 


(so =) 


(l/2) £ 


(7/10)5 


(7/10)e 


ratio 


4.04% 


3.58% 


3.32% 



Table 2: Comparison of code lengths with the codes by Nuida et al. [10] (c = 3) 



N 


300 


10 9 


10 6 


e 


ur 11 


10~ 6 


10~ 3 


nm 


1309 


1423 


877 


Our code 


420 


556 


349 


(eo =) 


(9/10)e 


(l/100)e 


(l/100)e 


ratio 


32.1% 


39.1% 


39.8% 



4 Security Proof 

In this section, we present an outline of the proof of Theorems Q] and [2j Omitted details of the proof will be 
supplied in Sect. [6] 

First, we present some properties of the threshold parameter Z — Z y >, which will be proven in Sect. 16.11 

Proposition 1. 1. If Z satisfies the condition fifty, then the conditional probability that 5(1) > Z for 
some I £ U\, conditioned on the choice of y' , is not higher than (N — \)eq/N. 

2. The value Z = Z in satisfies the condition fifty. 

To prove Theorem [TJ we consider the case that the number of pirates \Up\ is three. By symmetry, we 
may assume that Up = {1, 2, 3}. Put Tp = 123, therefore we have Tp <E T(y') by Marking Assumption. Now 
we consider the following four kinds of events: 

Type I error: 5(1) > Z for some innocent user I £ U\. 

Type II error: T n T P = for some T £ T{y'). 

Type III error: There are Ti,T 2 £ T(y') such that ^ T t n T 2 C Uj, |Ti n T P \ = 1 and |T 2 n T P \ = 1. 

Type IV error: S(i) < Z for every i £ {1,2,3}, and there is an innocent user I such that 121 £ T(y'), 
131 £ T{y') and 231 £ T(y'). 

Then we have the following property, which will be proven in Sect. 16.21 

Proposition 2. If \Up\ — 3, then tracing error occurs only when one of the Type I, II, III and IV errors 
occurs. 

By this proposition, the error probability is bounded by the sum of the probabilities of Type I-IV 
errors. By Proposition [TJ the probability of Type I error is bounded by eo- Now Theorem [JJ is proven by 
combining this with the following three propositions, which will be proven in Sect. 16.31 Sect. 16.41 and Sect. 16.51 
respectively (see §E§ for the notations): 

Proposition 3. If \Up\ = 3, then the probability of Type II error is not higher than ( ^~ 3 ) fi(p) m ■ 
Proposition 4. If\Up\ = 3, then the probability of Type III error is not higher than 3(N — 3)(N~ 4)/ 2 (p) m . 
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Proposition 5. If \Up \ = 3 and the threshold Z is chosen so that the condition |3P holds and Z < Zq, then 
the probability of Type IV error is lower than (N - 3)(1 _p)-V( m /2)i°g(JV/ eo )y 3 (p)m_ 

To prove Theorem [21 we set p = 1/2. Then the bound of error probability given by Theorem [1] is 
specialized to the value specified in Theorem[2j Hence our remaining task is to evaluate the error probabilities 
for the case that the number of pirates is one or two. 

First we consider the case that there are exactly two pirates, say, 1,2 G U. The key property is the 
following, which will be proven in Sect. 16.61 

Proposition 6. In this situation, if the condition U0\) is satisfied, then the probability that 5(1) < Z and 
5(2) < Z is lower than Eq/N. 

By this proposition, when the condition (|10[) is satisfied, at least one of the two pirates is output in Step 
H]of the tracing algorithm with probability not lower than 1 — Eq/N. On the other hand, by Proposition [TJ 
some innocent user is output in Step H] with probability not higher than (TV — 1)eq/N. Hence in Step 21 at 
least one pirate and no innocent users are output with probability not lower than 1 — e . This implies that 
the error probability is bounded by £o in this case. 

Secondly, we consider the case that there is exactly one pirate, say, let/. Then we have the following 
property, which will be proven in Sect. 16.71 

Proposition 7. In this situation, ifm> 2\og(N/so), then the score 5(1) of the pirate is always higher than 
or equal to Z . 

By this proposition, when the condition (fTUf is satisfied, the pirate is always output in Step 0] of the 
tracing algorithm. Hence by the same argument as the previous paragraph, the error probability is bounded 
by £q in this case as well. Summarizing, the proof of Theorem [2] is concluded. 



5 On implementation of the tracing algorithm 

In this section, we discuss some implementation issue of the tracing algorithm Tr of the proposed 3-secure 
code. More precisely, we consider the calculation of the set T(y') appeared in Step [5] of Tr. By a naive 
calculation method based on the definition ((3|) of T{y'), we need to check the condition y' € £ (w^ , Wi 2 , Wi 3 ) 
for every triple i\i 2 iz of users, therefore the time complexity with respect to the user number N is inevitably 
5!(iV 3 ). As this complexity is larger than tracing algorithms of many other c-secure codes such as Tardos 
codes [2], it is important to reduce the complexity of calculation of T(y'). 

To calculate the collection T(y'), we consider the following algorithm, with codewords wi, . . . , wn and 
the m-bit word y' as input: 

1. Set C[ 1} ={ie [N] | Wi,i = y[} and £ { 2 1} = £ { 3 1} = 0. 

2. For each 2 < j < m, construct , an d £3 inductively, in the following manner. (At the 
beginning, set c[ 3] = £ { 2 j) = £ { 3 j) = 0.) 

(a) Put Cj ={ie [N] I Wij = y'j}. 

(b) Set Cf = £ { r 1] n Cj. 

(c) Add the pair {£ { ^ 1] \ C 3 ,C 3 \ £ ( f' 1) } of subsets of [N] to £ { 2 j) . 

(d) For each pair {K\, K 2 } of subsets of [N] in C$ 1 , 

• add two pairs {K t n C J: K 2 } and {K x \ Cj,K 2 n Cj} to £ { 2 j) ; 

• add the triple {Kx \ Cj , K 2 \ Cj , Cj \(K 1 UK 2 )} of subsets of [N] to £ { 3 n . 

(e) For each triple {K\, K 2 , K3} of subsets of [N] in £3 1 , add three triples {K\ n Cj, K 2 , K 3 }, 
{K! \ Cj,K 2 n Cj,K 3 }, {Kx \ Cj,K 2 \ Cj,K 3 n Cj} to £ 3 j) . 
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(f) Remove from C 2 every pair {Ki,K2} with K\ or K 2 being empty, and from £ 3 every triple 
{Ki, K 2 , K3} with K\, K 2 or K3 being empty. 

3. Output the collection of the triples T — ii«2*3 of distinct numbers i%, 12, 23 satisfying one of the following 
conditions: 

• we have i\ £ Z^™ 1 ' and 12, 13 are arbitrary; 

• for some {K\, K2} £ ^"'i we have H G K\, *2 £ K2 and 13 is arbitrary; 

• for some {Ki, K 2 , K 3 } £ £3 , we have i\ £ K\, i 2 £ K% and £3 £ K 3 . 

An inductive argument shows that, for each j £ [m] and each triple of distinct 11,12,13, the j-bit initial 
subword (y[ , . . . , y' 3 ■) of y' is in the envelope of the j-bit initial subwords of , Wi 2 , Wi a if and only if one of 
the following conditions is satisfied (note that the order of members of a pair or triple is ignored) : 

• we have i\ £ £^ and %2, 13 are arbitrary; 

• for some {K\, K 2 } £ C 2 \ we have i\ £ K\, i 2 £ K 2 and i 3 is arbitrary; 

• for some {Ki.K 2 , K 3 } £ Cg' , we have i\ £ K\, i 2 £ K 2 and 13 £ K 3 . 

By setting j = m, it follows that the above algorithm outputs T{y') correctly. 

Now for each 2 < j < m, complexity of computing \ C 2 \ and £ 3 ^ from Cf~ 1 \ C% , and £3 J_1 ^ 
is approximately proportional to N times the total number of members of £ 2 and £ 3 . Hence the 
total complexity of the algorithm is approximately proportional to Nm times the average of total number 
of members in £g and £3 over all 1 < j < m — 1. This implies that the order (with respect to N) of 
complexity of calculating T(y') can be reduced from 0(iV 3 ) if the average number of pairs and triples in C 2 ^ 
and is sufficiently small. The author guesses that the latter average number is indeed sufficiently small 
in most of the practical cases, as the size of T(y') would be not large in average case (provided the code 
length m is long enough to make the error probability of the fingerprint code sufficiently small). A detailed 
analysis of this calculation method will be a future research topic. Instead, here we show some experimental 
data for running time of the above algorithm, which was implemented on a usual PC with 1.83GHz Intel 
Core 2 CPU and 2Gbytes memory. We chose parameters N — 1000, m = 180, £0 = 0.001, and adopted 
minority vote attack as pirate strategy. Then the average running time of the algorithm over 10 trials was 
4331.5 seconds, i.e., about 1 hour and 13 minutes, where the calculation of running times was restricted to 
the case that scores of all users are less than the threshold, as otherwise the tracing algorithm halts before 
Step El 



6 Proofs of the Propositions 
6.1 Proof of Proposition [T] 

First, we prove the claim 1 of Proposition Q] For each I £ U\ and a £ {H, L}, let K a = {j £ A a \ w\j = y'j}. 
Then we have S(\) = \K-g\ log(l/p) + \Ki\ k>g(l/(l —p))- Now note that the choice of y' is independent of 
w\. This implies that we have Pr[w\j = y'j | y'] = p for each j £ An, and we have Pr[w\j = y'j | z/'] = 1 — p 
for each j £ A^. Hence the conditional probability that |Xh| = &h and \Ki\ = conditioned on this y' , is 
-p) fcL p aL ~ feL (^)p fcH (l _p) Q H-fc H . Xnis implies that Pr[S{\) >Z\y'\ is equal to the left-hand side 
of ((H), therefore the claim I holds as there exist at most N — 1 innocent users I. 

Secondly, to prove the claim 2 of Proposition [I] we use the following Hoeffding's Inequality: 

Theorem 3(4, Theorem 2). Let X±, X 2l . . . , X n be independent random variables such that a, < Xi < hi 
for each i. Let X be the average value of X\, . . . ,X n . Then for t > 0, we have 
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As mentioned above, the left-hand side of ([5]) is equal to Pr[S(\) > Z \ y'}, where I is any specified 
innocent user. Now for each j £ [m], let Xj be a random variable such that 

{Pr[X 3 = log(l/p)] = p , Pr[Xj = 0] = 1 -p if j £ A H , 

|Pr[X, = log(l/(l - p))] = 1 - p , Tr[X, = 0] = p if j £ A h . 

Then, conditioned on this y' , the variables X\ 1 . . . ,X m are independent and 5(1) = ml. Now by a direct 
calculation, we have E[S(\) \ y'] = mE[X \ y'] = p where p — a^p\og{l/p) + — p)log(l/(l — p)). 
Moreover, we have < Xj < log(l/p) if j £ Ah, and we have < Xj < log(l/(l — p)) if j £ A^. Hence 
Theorem [3] implies that 

/ -2mH 2 \ 

Pr[S{\) -p> mt | y'] < exp (13) 

V a n(log(l/p)) 2 + <z L (log(l/(l -pWJ 

for i > 0. Now by setting t = rj/m where 



\ 




(14) 



the right-hand side of (jT3")) is equal to Sq/N. On the other hand, for the left-hand side of (|13|) . we have 

Pr[S(\) - p>mt\y'} = Pr[S{\) >p + r]\y'} , (15) 

while the value of Z = Zq in © is equal to fi + rj. Hence the condition §5§ is satisfied, concluding the proof 
of Proposition [T] 

6.2 Proof of Proposition [2] 

To prove Proposition^ suppose that it is not the case of Type I-IV errors. We show that tracing error does 
not occur in this case. Recall that Tp = 123 £ T(y')- By the absence of Type I error, it holds that either 
some pirate and no innocent users are output in Step 0] of Tr, or S(i) < Z for every i £ U and nobody is 
output in Step @] It suffices to consider the latter case. We have Tp £ T' by the absence of Type II error. 
Hence every T £ T' intersects Tp, and f]T' C Tp. By virtue of Step[6l it suffices to consider the case that 
f)T' = 0- Now there are the following two cases: (A) we have |T n Tp| = 1 for some T £ T'] (B) we have 
|T n T P I = 2 for every T £ V \ {T P }. 



6.2.1 Case (A) 

Let Ti £ V and \T X n T P | = 1. By symmetry, we may assume that T x n T P = {1}. By the fact f]V = 0, 
there is a T2 £ T 1 such that 1 ^ T2. We may assume by symmetry that 2 G T 2 , as T 2 n Tp ^ 0. We have 
Ti HT 2 ^ as Ti £ V, therefore the absence of Type III error implies that 3 £ T 2 . Put T 2 = 231 with I £ U h 
and Ti = HI' with I' £ U\. Now if we calculate the set V by using {Tp, T%, T 2 } instead of T', then the result 
is 

{12, 13, II, 21, 21', 31, 31'} . (16) 

In general, the actual set V is included in the set (fT6)) . Now we present two properties. First, we show that 
12, 13 £ V. Indeed, if 12 g V, then we have 12 n T = for some t £ T . Now we have 3 £ T and T x n T ^ 
as T £ T', therefore Ti and T contradict the absence of Type III error. Hence we have 12 £ V ', and 13 G P 
by symmetry. Secondly, we show that no innocent users are output in Step|5J Indeed, if an I" £ U\ is output 
in Step [SJ then the possibility of V mentioned above implies that I" £ {I, I'} and we have i £ V\ and i\" £ V 
for some % £ 123. This is impossible, as 12, 13 £ V . Hence this claim holds, therefore it suffices to consider 
the case that nobody is output in Step[8j namely V\ = 0. 
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By these properties, we have either 21', 31' <G V or 21', 31' ^ V (otherwise I' G V\, a contradiction). 
Similarly, we have V H {21,21'} ^ and V n {31,31'} ^ 0. First we consider the case that 21', 31' € "P. As 
Pi = 0, it does not hold that \V n {1I,2I,3I}| 7^ 1. If 11,21,31 <= 7>, then |P| = 7, P 2 = {I'}, and 2 and 3 
are output in Step [HI If \V H {II, 21, 3I}| = 2, then \V\ — 6, 7^ V3 C C/p and a pirate is correctly output in 
Step [TO] Finally, if 11,21,31 V, then |P| = 4 and V = {12, 13, 21', 31'}. Now I' is not output in Step EH as 
123 € V . Moreover, if none of 1, 2, and 3 is output in Step [[3 then it should hold that 121', 131', 231' € V, 
contradicting the absence of Type IV error. Hence a pirate is correctly output in Step [T31 concluding the 
proof in the case 21', 31' G V . 

Secondly, we suppose that 21', 31' ^ V, therefore 21, 31 € P. There are two possibilities V = {12, 13, 21, 31} 
and V = {12, 13, II, 21, 31}. The former case is the same as the previous paragraph. In the latter case, we 
have \V\ = 5, T" C {121, 131} and V 2 = 23. Hence 2 or 3 is correctly output in StepEQwhen T" 7^ 0. On the 
other hand, when T" = 0, 2 and 3 are correctly output in Step[T2J Hence the proof in the case 21', 31' ^ V 
(therefore in the case (A)) is concluded. 



6.2.2 Case (B) 

As P| T' = 0, there are U, b, I3 G U\ such that I2I3, 13b, 23li G T' ■ By the absence of Type IV error, it does 
not hold that li = 1 2 = I3. By symmetry, we may assume that li 7^ b- Then by calculating the set V by 
using {123, 12I3, 13l 2 , 23U} instead of T, it follows that the actual V satisfies V C {12, 13, 23, ll x , 2I 2 , 3I 3 }, 
while 12, 13, 23 G V by the assumption of the case (B). \iV = {12, 13, 23}, then 1, 2 and 3 are output in 
Step[T?J Therefore it suffices to consider the case that {12, 13,23} C V . 

If li 7^ I3 7^ I2, then we have 7^ V\ C 1 1 1 2 13 and a pirate is correctly output in Step [5] Hence it suffices 
to consider the remaining case. By symmetry, we may assume that li = I3 7^ b- If 2b G V, then we have 
b G V\ C bb, and 2 is correctly output in Step[8l From now, we assume that 2b ^ V . If 1 1 1 G" V or 3b ^ P, 
then we have V\ = {b} as {12,13,23} C V, therefore 1 or 3 is correctly output in Step [8] On the other 
hand, if lb, 3b G V, then we have V = {12, 13, 23, Hi, 3b}, while 13b ^ T' by the absence of Type IV error 
(note that 12b, 23b G T'), therefore T" = {123}, Vi — 2b and 2 is correctly output in Step[TTJ Hence the 
proof in the case (B), therefore the proof of Proposition^ is concluded. 



6.3 Proof of Proposition [3] 

To prove Proposition^ let b, b and b be three distinct innocent users. Given y' and st = we introduce 

the following notation for j G [m]: 

Zf=i 1 [iP3=P ' # = ■ (17) 

1 [0 if Pj = 1 -p , J Sj y ' 

Note that the sets A a for a G {H,L} defined in Sect. [3] satisfy that A a — {j \ y'j = We write 

A a = A a (y' , st) and a a = \A a \ = <z CT (y',st) when we emphasize the dependency on y' and st. Then, as the 
bits of codewords are independently chosen, we have 

Pr[libl 3 G 7V) I y'.st] = (1 -/) aL (l - (1 -p) 3 )° h , (18) 

therefore 

-P?-[lil 2 l 3 S T(y')} =^2Pr[y',st}(l-p 3 ) a ^ y '' st \l - (1 - p )3)«H(y',st) _ ( 19 ) 

J/',st 

Now we present the following key lemma, which will be proven later: 

Lemma 1. Among the possible pirate strategies p, the maximum value of the right-hand side of \19\) is 
attained by the majority vote attack, namely the attack word y for codewords Wi,W2, w% of three pirates 
satisfies that yj =0 if at least two of Wij, u)2,j, are an d Vj = 1 otherwise. 
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If p is the majority vote attack, then for each j E [m], we have j € An(/st) (i.e., £^ becomes the 
majority in Wij, w% t j, w>3,j) with probability 3/(1 — p) + p = 3p 2 — 2p 3 and j G A\ J (y', st) with probability 
1 - 3/ + 2p 3 . This implies that 

Pr[lil 2 l 3 er(y')] 

= ^K = a L , aH = a H ](l-p 3 ) QL (l-(l-p) 3 ) QH 



OL+aH=m 



E ( (a" ) (1 _ ^ + 2p3)QL (3P 2 " 2/)" H (1 - P 3 )" L (1 - (1 - P ) 3 / H j (2Q) 



5^ ( ( m ] (1 - 3/ + p 3 + 3p 5 - 2p 6 ) QL (9/ - 15/ + 9 P 5 - 2/) QH 



OL,«H 

c»L+aH=rra 



= (1 - 3/ + 10/ - 15/ + 12/ - 4/) m = /i(p) m . 

By virtue of Lemma [TJ for a general p, i-'r [1 1 12 13 <E T{y')] is bounded by the right-hand side of the above 
equality. This implies the claim of Proposition [3l as there are (^j/) choices of the triple b, b, I3. 
To complete the proof of Proposition G2 we give a proof of Lemma Q] 

Proof of Lemma[J] Fix the codewords wi, W2, W3 of the three pirates 1, 2, 3 € U. Let wp denote the collection 
of those three codewords. Let jo S [m] be the index of a detectable column. By symmetry, we may assume 
without loss of generality that Wi t j = W2.j Q = and W3j = 1. Now let y° be an arbitrary attack word such 
that y°j o — 0, and let y 1 and y- be the attack words obtained from y° by changing the jo-th column to 1 
and to ?, respectively. We show that if the pirate strategy p for the input w p is modified so that it outputs 
y° instead of y 1 and /, then the right-hand side of (fT9|) will not decrease. As wp, jo and y° are arbitrarily 
chosen, the claim of Lemma [T] then follows. 

Let y'° be an m-bit word such that y'° = y® for any j € [m] with y® 7^ ?, therefore y'° is obtained from 
y° in Step [T] in the tracing algorithm with positive probability. Let y n be the m-bit word obtained from y'° 
by changing the j'crth column to 1. Moreover, let st° = (pj)j be any state information such that pj = 1 — p, 
and let st 1 be the state information obtained from st° by changing the jo-th component to p. 

In this case, by independence of the columns, we have Pr[wp | st°] = ap 2 (l — p) and Pr[wp | st 1 ] = 
ap(l — p) 2 for a common a > 0. As Pr[st°] = Pr^t 1 ] > and Pr[wp] > 0, Bayes Theorem implies that 
P?'[st° I wp] = a'p 2 (l — p) and Pj^st 1 | wp] = a'p(l — p) 2 for a common a' > 0, therefore 

Pr [st° I wp, (st° or st 1 ) ] - - ?n ^J^lA. = ( 2 1) 

and Pr^st 1 | wp, (st° or st 1 ) ] = 1 — p. Now there is a common f3 > such that, for each a; € {0, 1}, 

Pr[y /0 |st a , 2/ °] = Pr[ 2/ /1 |st*,/]=/3 , 
Pr[y'° I st*,/] = Pr[y n | st 21 ,/] = , 
Pr[y'° I st ,/] = Pr[y n \ st 1 ,/] = pp , 
Pr[y'° I st 1 ,/] = Pr^' 1 | st ,/] = 0(1 - p) . 

As the choice of the attack word y for given wp is independent of st, and the choice of the word y' will be 
independent of wp once the attack word y is determined, it follows that 

Pr[y' x , st x ' I w P , (st° or st 1 ) ,y x "] = Pr[s/ | w P: (st° or st 1 ) iPrk/* | st x ',y x "] (23) 
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for x, x' £ {0, 1} and x" £ {0, 1, ?}. By these relations, we have 



Pr[{y u 


4-0 \ 

st u ) 


or 


( n 4.l^ 


wp, 


( 4-0 

(sr 


4.1 \ 

or st J 


0i 

>y\ 


= p 


p + 


(1 - p) ■ = pj3 , 


pry 1 


st°) 


or 


(y'V 1 ) 


wp, 


(st° 


or st 1 } 


,y°] 


= i 


-pP 




Pr[(y'° 


st°) 


or 






(st° 


or st 1 ) 


,y'] 


= p 


+ 


{l-p)-P = (l-p)0 


Pr[(y n 


st°) 


or 


(y' ,* 1 ) 


w-p, 


(st° 


or st 1 } 


.y 1 ] 


= i 


-(i 




Pr[(y'° 


st°) 


or 


(y'Sst 1 ) 


W Pl 


(st° 


or st 1 } 


J] 


= p 


Pp- 


1- (1 - p) • j8p = pj8 , 


Pr[(y n 


st°) 


or 


(y'V 1 ) 


Wp, 


(st° 


or st 1 } 


,v 7 ] 


= i 


-pP 





(24) 



Now note that p > 1/2, therefore we have 1 — p 3, < 1 — (1 — p) 3 and p/3 > (1 — p)/3. Note also that 
a H (y'°,st ) = an(y n , st 1 ) = an{y'°, st 1 ) + 1 = a H (y'\ st°) + 1. This implies that, in the case st £ {st^st 1 }, 
if the pirate strategy p for the input w p is modified in such a way that it outputs y° instead of y 1 and y ? , 
then the right-hand side of (fl9|) will not decrease. As this property is in fact independent of the choice of 
st° and st 1 , the claim in the proof follows, concluding the proof of Lemma [TJ □ 



6.4 Proof of Proposition [4] 

To prove Proposition |4l we fix an innocent user lo £ U\ and consider the probability that there are Xi,T 2 £ 
T(y') such that l £ T x (1T 2 C U\, T x nT P = {1} and T 2 nT P = {2}; or equivalently, there are innocent users 
111 I2 € V\ \ {lo} such that 1 lo 1 1 £ T(y') and 2I I 2 £ T(y'). We introduce some notations. Given y', w±, u> 2 , 
w\ , and st = (j)j)j, we define, for a, /3,7, 5 £ {H, L}, 



(25) 



a a p 7 s = \{j £ [m] I y'j = ^,w hj = ^,w 2 j = £],Wi ,j = $}\ 

(see (|17p for the notations). Moreover, by using V as a wild-card, we extend naturally the definition of 
a Q/ 3 7 5 to the case a, /?, 7, <5 £ {H, L, *}. For example, we have o Q „j = a QH H<5 + a Q HL<5 + o-ai^m + a Q LL(5- Note 
that a x »*» (2 £ {H, L}) is equal to the value a x in Sect. [3] 
Now for an innocent user li ^ lp, we have 



Pr[ll li e T(y') I y',w 1 ,w 2 ,w [o , st] = p a ^(l - p)<™ . 

Therefore we have 

Pr[ll li £ T(y') for some h £ Uj \ y', Wi, w 2 , W\ , st] < (A^ - 4)p aHL * L (l - p) QLH * H 
as there are N — 4 choices of I*. Similarly, we have 

Pr[2l l 2 £ T(y') for some l 2 £ Ui \ y' , w x , w 2 , w\ , st] < (iV - 4)p aH * LL (l - p) aL ' HH 



(26) 
(27) 
(28) 



Hence the probability that 1 lo I x , 2 1 1 2 £ T(y') for some U, l 2 £ Uj, conditioned on the given y', w%, w 2 , w\ , 
and st, is lower than the minimum of the two values (jV — 4)p aHL * L (1 — p) aLH * H and (iV — 4)p aH * LL (l— p)" L,HH , 
which is not higher than 



\/(N - 4)p°HL*L (1 - p)^H,H . (ft - 4)p a H.LL (1 _ p)o 



(iV-4)VP 



a HL*L"l" a H*LL 



■1LH*H+1L*HH 



(29) 



Now given y' , w\, w 2 , and st, the probability that wi attains the given values of ohlll, ohlhl, ihhlLj 
olhlh, olhhh, and (Zllhh (denoted here by rf) is the product of the following six values 



/ ghll* \ 

V a HLLL/ 
/ a HHL* \ 
\ a HHLL/ 
/ flLHH*\ 
\ a LHHH 



/ OHLH* \ 

\ a HLHL / 



H pjflHLHLplHLH*-flHLHL 



pjaHHLLplHHL*-lHHLL 



P 



I (l-PT 



)P° 



'(1-P) 



a LHH* "fflLHHH 



\ a LHLH / 



f flLLH * )p a ^(l-p) a 
\ a LLHH/ 



LHL* a LHLH 



(30) 



LLH* -OLLHH 



13 



By the above results, it follows that 

Pr[ll li,2lol 2 € T(y') for some li,l 2 € t/i | y', wi, w 2 , st] 

< ^ r?(JV - 4)vp 2aHLLL+ " HLHL+aHHLL y/T^ p 



(31) 



where the sum runs over the possible values of ohlll, ohlhl, ohhll, olhlh, olhhh, and czllhH' Now by 
the above definition of 77, the summand in the right-hand side is the product of N — 4 and the following six 
values 

/a H LL*V l p)aHLLLpQHLLt ^ /a H LH»\ (( 1 _ p ) v ^»HLHL paB LH.-«HLHL ^ 

\ a HLLL/ V^HLHL / 

/a HHL »\ ((l-p) v ^"H^p«H„L,-a„HLL ( (^LHL*\ ^y LHLH (l _ p )«LHL.-a L HLH ; (33) 

\ a HHLL / V a LHLH/ V ' 



MLHH.N OLHHH(1 _ p)OLHH . ^ ( aLLH *Up0^y LLHH (1 _ p) a LL H»-a 
\ a LHHH/ \ a LLHH/ \ ' 

Then by the binomial theorem, the sum is equal to 

(N - 4) (p(2 - p)) aHLL * (p + (1 - p)^) aHLH * +aHHL * 
■[l-p + p^p) ((l-p)(l+p)) aLHH * . 

Given y', st, idi, iu 2 , and w 3 , we define, for a,/3,j,S <E {H, L}, 

W = \{j e H I ^ - j = v>2j = q, w 3ij =$}\. (34) 

Then by Marking Assumption, (|33p is equal to 

(JV - 4)(2p - p 2 ) & H LL H (p + (1 _ p)^)fcHLHL+6HLHH+6HHLL+f>HH L H 



a LLHH 



(33) 



^lhll+^lhlh+^llhl+^llhh , 

\ dlhhl 



1 /' • l>\ 1 /' I 1 p 2 r 



(JV - 4)(2p-p 2 ) bHLLH (l -p 2 ) bLHHL (p+ (1 -p)s/py 



'HLHL + ^HHLH 



(35) 



, \ &LHLH+&LLHL . , . / . \ &LHLL "f ^LLHH 

1 - p + py/r^) (p + (1 - p) \ f^p*) (1 _ p + pyr^) 



By writing the right-hand side of ([33)) as 77', it follows that 

Pr[ll li,2l l 2 € T(y') for some U,l 2 G C/i | wi,w 2 ,w 3 ] < ^Pr[2/',st I ™i, w 2 , . (36) 

y',st 

Now we present the following key lemma, which will be proven later: 

Lemma 2. Among the possible pirate strategies p, the maximum value of the right-hand side of h3&jl is 
attained by majority vote attack p ma j (cf, Lemma\lty. 

By (|3"o]) . we have 

Pr[ll li, 2I0I2 € T(y') for some li, l 2 e£/i] < Pr[wi, w 2 , w 3 ] ^Pr[y',st | wi,W2,w 3 ]r]' 

(37) 

= ^ Pr[y',st,wi,w 2 ,w 3 }r/' . 

y' ,st,u;i ,u>2 ,u>3 

By virtue of Lemma [H the maximum value of the right-hand side is attained by majority vote attack /9 ma j . 
Now for p = Pmaj, the word y' is uniquely determined by w\, iu 2 , and w 3 , and we have &hllh = ^lhhl = 
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&HLHL — &LHLH — ^HHLL — &LLHH — 0, &HHLH — ^HLH, ^LLHL — ^LHL, &HLHH — ^LHH, and &LHLL — ^HLL, 

where, for a, /?, 7 £ {H, L}, 

d aM = \{j e [m] 1 WlJ = e;, «fl2j = «; 8 j = e/}i • (38) 

This implies that 

r ? ' = (iV-4)(p+(l-p)Vp) <iLHH+<iHLH (l-p + pv/w) ■ (39) 

Put (iother = m — c?hll — <^lhl — <^lhh — ^hlh- Now given st, the probability that wi, W2 and attain the 
given values of d H LL, ^lhl, <^lhh and d H LH is 

(\ , , m , , Vp(l-p) 2 ) dHLL+dLHL (p 2 (l~p)) dLHH+dHLH (l-2p(l-p)) d ° th " (40) 

\"HLL, «LHL, "LHH, "HLH, "other/ 

which is independent of st. This implies that 



,wi,W2,w 3 ]ri' 



Pr b'> st >' 

y' ,St,10i ,1^2 ,tt?3 

= E( d i /' „ „ )(A'-4)(p(i-p«i- P + p^)) J " ll+Jlhl <41) 

V"HLL) "LHL, "LHH, "HLH, "other/ v / 

• (p 2 (l -p)(p+ (1 -p)V?)) dLHH+dHLH (1 - 2p(l -p)) d — 
(where the sum runs over the possible values of djjLL , <^lhl , ^lhh , and c?hlh ) 

= EC ^ w rf V- 4 ) (p(i-p) 5/2 (p+yr^)) d 

— V" L'" H) "other/ v ' ^2) 

• (p 5 - p)(l -p + VP)) (1 - 2p + 2p 2 ) doth " 

(where the sum runs over the possible values of d l = ^hll + ^lhl and d__n = c?lhh + ^hlh) 

= (iY-4)(p(l-p) 5 / 2 (p +v ^^)+p 5 / 2 (l-p)(l-p+Vp) + l-2p + 2p 2 J ={N-A)f 2 {p) m . (43) 

By the above argument, the value Pr[ll li, 2 1 1 2 € T{y')iov some I x , 1 2 £ C^i] for a general p is also bounded 
by the above value. Hence Proposition |4] follows, by considering the number of choices of the pair 1,2 and 
the innocent user lo- 

To complete the proof of Proposition [4] we give a proof of Lemma [2] 

Proof of Lemma® First, note that 1/2 < p < 1, therefore < 2p - p 2 < 1, < 1 - p 2 < 1 and < 
1 — p + p-y/1 — p < p + (1 — p)y/p < 1. Now by the definition (|35l) of rf , for each j <E [m] such that 
u>ij = W2,j 7^ w 3,j, the value of 7/ is increased by setting the j-th bit of the attack word y to be w±j instead 
of W'i.j or '?' (which makes the values of &hllh and &lhhl smaller). 

We consider the case that w±j — W3J 7^ W2.j- If Wij = ^j 1 , then the contribution of the j-th column to 
the value rj' is p + (1 — p)^/p when y'j — wij and 1 — p + p\/\ — p when y'j = W2,j- On the other hand, if 
wij = £j, then the contribution of the j-th column to the value rj is 1 — p + pyT — p when y'j = wij and 
p+ (1 — p)y/p when y'j = W2,j- Recall the relation 1 — p + p^/1 — p < p + (1 — p)y/p. Now the same argument 
as Lemma [T] implies that Pr[wij = = p > 1 — p = Pr[wij = (g] in this case. This implies that the 
value of the right-hand side of (|31>|) is not decreased by setting y'j to be w^j instead of w 2 .j (the detail of the 
proof is similar to the proof of Lemma [lj. Similarly, in the case that Wij 7^ w^j = w%j, the value of the 
right-hand side of (|36p is not decreased by setting y'j to be u>2j instead of w\j. 

Summarizing, the value of the right-hand side of (|36|) is not decreased by setting y'j to be the majority 
of Wij, W2j, and w^j, instead of the minority of them. Hence the maximum value of the right-hand side of 
(|36|) is attained by the majority vote attack, concluding the proof of Lemma [21 □ 
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6.5 Proof of Proposition [5] 

To prove Proposition [5j we fix an innocent user I and suppose that S(i) < Z for every i £ 123. Given y' , wi, 
u> 2 , tt?3, and st, we define, for a, /3,j,S£ {H, L}, 

a a0yS = \{j€ H | y'j = £j,w hj = tf,W2j = t],W3,j =$}\ • (44) 

Then we have 

Pr[l2l, 131,231 £ T(V') \ y', Wl, W 2 , W 3 , St] = J ,«HLLH+a H LHL+«HHLL( 1 _ p )«LLHH+aLHLH+aiHHL _ (45) 

Let cil and an be as defined in Sect. [3l For x £ {L,H}, let a" and be the number of indices j £ [m] of 
undetectable and detectable columns, respectively, such that y'j = £J . Note that an = a K + a R , while we 
have = anHHH and al = ollll by Marking Assumption. Now we have 

5(1) + 5(2) + 5(3) 

>"II1111JI - -("I.ILU11 + "1I11LII t "mil IL i + "1ILLJ.I - "I1LI.IL •- " 111 ILL ! !n " 



( 3(2hhhh + 2(anLHH + flHHLH + ohhhl) + ohllh + ohlhl + ohhllJ log ■ 



^3aLLLL + 2((2lllH + «LLHL + OLHLl) + «LLHH + &LHLH + a LHHL^ log 



P (46) 



= a B log - + al log — h 2 ( a H log - + a L log ■ 



p 1 — p \ p 1 — p 

(OHLLH + &HLHL + OHHLl) log (&LLHH + OLHLH + »LHHl) log ■ 



therefore 



P 1-P 



(aHLLH + OHLHL + OHHLl) log - + (flLLHH + OLHLH + a LHHL) log ■ 



' P 1-P 

2 ( a H log - + a L log — !— ) + a£ log i + a£ log — !— - 5(1) - 5(2) - 5(3) (47) 
p 1 — p J p 1 — p 

1 , 1 \ ... 1 ... 1 

3Z 



> 2 [an log - + a L log + a£ log - + a£ log 

V p i-p/ p i-p 



where we used the assumptions that 5(i) < Z for every i £ 123 and Z < Zq. By using the relation 
aL = m — an and the definition (j6|) of Zq, the right-hand side of the above inequality is equal to 

(3p - l)m log — !— + a H ( (2 - 3p) log - + (1 - 3p) log ■ ' 



1—p \ p 1 — p 



as log - + al log 



1 / (, 1\ 2 A IV \ . N 



log - ) a H + ( log i ) a L ) log — 



(3p - l)m log - J_ + al log - J- + a£ ( (3 - 3p) log - + (1 - 3p) log -J- ) (48) 

i-p l-p V p i-py 



+ 4((2-3p)logi + (l-3p)log- 



p i -p 

I f, 1 V I , \ \ , 1\ \ \. AM 



^ 3 (5(l log wi m -(l log T^J -l log ;J ) m rT a 

(where we used the relation an = a H + a H ) 

> (3p - l)mlog - J— + a£ f (3 - 3p) log - + (1 - 3p) log - J— ) 

l-p V p i-p/ 



+ 4 ( (2 - 3p) log - + (1 - 3p) log -i— ) + a£ log - 3 J log - log 

P I-P/ 1 -P V 2 e 1 -p 



(49) 
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(where we used the fact log(l/(l — p)) > log(l/p) > 0). By applying the above inequalities to (|45p . we have 

Pr[12l, 131, 231 g T(y') \ y', w 1 ,w 2 ,w 3 , st] 

. » d (50) 

< (1 - p)^-^(\ - p )-3V(W2) « ( p 3-3p (1 _ p )l-3p)»H (1 _ p) at ( p 2-3 P(1 _ p )l-3p)»H _ 

We write the right-hand side of ((511)) as 77. Then we have 

Pr[12l, 131, 231 €T(y') \w u w 2 , w 3 ] < ^ Pr[y',st\w u w 2 ,w 3 }ri 

S(1),S(2)%(3)<Z ( 51 ) 

< ^Pr[y', st I w 1 ,w 2 ,w 3 }f] . 
Now we present the following key lemma, which will be proven later: 

Lemma 3. Among the possible pirate strategies p, the maximum value of the right-hand side of i51\) is 
attained by majority vote attack p ma j (cf, Lemma\^). 

By (|5"Tj) . we have 

Pr[12l, 131, 231 G T(y')] < ^ Pr[ Wl ,w 2 ,w 3 ] ^Pr[j/',st | 101,11*, w 3 ]v 

w 1 ,w 2 ,w 3 y',st 

= ^2 Pr[y',st,wi,w 2 ,w 3 ]rj . 

By virtue of Lemma [3l the maximum value of the right-hand side is attained by majority vote attack p ma j. 
Now for p — p ma j and given st, the probability that u>i, w 2 , w 3 and y' attain the given values of aj^, a^, and 
is 

f u u TO d d )(p 3 r s ((i-p) 3 r s (3p 2 (i- P )rH(3 P (i-p) 2 )^ (53) 

which is independent of st, where we put = m — — a£ — af^. Hence we have 
^ Pr[y',st,wi,w 2 ,w 3 ]r] 

y' ,S\.,W\ ,W2 } W3 



E ( f u u m d d )(p 3 r s ((i~p) 3 r E (3p 2 (i-p)) a «(3p(i~p) 2 r^ 



(54) 



(1 - p) (3p_1)m (l - p )-3^(m/2)log(Af/ £L ) 



■s L- - ffl d a d (/- 3p d-P) i - 3p )° H ((i-p) 4 ) ql (3p^d~pr^ aH (3Pd-rt 2 ) aL 

y \a H , a L , a H , a L / ^ 
(where the sum runs over the possible values of a^, a^, a^, and af) 



= (1 - p) (3p ~ 1)m (l - ■p)- 3 \/( m / 2 ) lo gW£o) 

■ ( p 6 - 3p {l - p) 1 -^ + (1 - p) 4 + 3p 4 ~ 3p (l - p) 2 - 3p + 3p(l - pf ) 
\ ' (55) 

= (1 - p)^-^ m {\ - p)-3V(W2) i» ( p 4-3 P(p 2 _ 3 p + 3 )(l _ J ,)1-3P + (! -p) 2 (p 2 +J3+ I))" 
= (1 — p) _3 "^ < - m / 2 - 1 1 °g( JV / £ o) / 3 (p) m 

By the above argument, the value Pr[12l, 131, 231 G T(y')] for a general p is also bounded by the above value. 
Hence Proposition [5] follows, as there exist N — 3 choices of the innocent user I. 
To complete the proof of Proposition [5J we give a proof of Lemma [3] 
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Proof of Lemma\^ First note that, by Marking Assumption, the terms in 77 other than (p 2 ~ 3 P(l - p) 1 - 3 ?)" 11 
are independent of the choice of y' for given wi, W2, and W3. An elementary analysis shows that p 2_3p (l — 
pf-^ is an increasing function of p e [1/2, 1), therefore p 2 ~ 3 P(l -p) 1 " 3 ? > (l/2) 2_3 / 2 (l/2) 1-3 / 2 = 1. Hence 
the value of r] will be increased by making the value of as large as possible. By the same argument as 
Lemma [TJ under the condition that the j-th column is detectable, the probabilities that the majority among 
Wij, W2j, and is and ^ are p and 1 — p, respectively. In other words, the probabilities that 
is the majority and the minority among W\j, W2j, and w 3 j are p and 1 — p, respectively. As p > 1 — p, 
it follows that the value of the right-hand side of (l5~Tj) will not decrease by setting the j-th bit of y' to be 
the majority of Wij, W2.j, and w^.j instead of the minority of them (the detail of the proof is similar to the 
proof of Lemma [lj . Hence the maximum value of the right-hand side of (|5ip is attained by the majority 
vote attack, concluding the proof of Lemma □ 



6.6 Proof of Proposition [6] 



First we introduce some notations. Given the codewords w\ and W2 of the two pirates 1 and 2, let a u and 
ad denote the numbers of undetectable and detectable columns, respectively. Then by Marking Assumption 
and the choice p = 1/2, we have 5(1) + 5(2) = (2a u + a d )log2 regardless of the pirate strategy p. This 
implies that, if 5(1) < Z and 5(2) < Z, then we have 



(2a u + a d ) log 2 < 2Z < 2Z = mlog2 



N 

2m log — log 2 

£0 



(56) 



By the relation a u + a d = m, this implies that 2m — a d < m + y/ 2m log( N/eq ) , or equivalently a d — m/2 > 
m/2 — y/2m \og(N / Eq) . Now for each j £ [m], the probability that the j-th column becomes detectable is 
1/2, therefore the expected value of a d is m/2. Then Hoeffding's Inequality (Theorem[3]) implies that 



Pr[S(l) < Z and S{2) < Z] < Pr[a d - m/2 > m/2 - ^/2m\og(N/e a )] 



( 



< exp 



exp 



-2m 



(m/2 - ^2mlog(Af/£ ) 



(57) 



m 2 (Vm- V81og(iV7io) 



V 



provided m/2 — yj2m\og(N / Eo) > 0. The last condition is equivalent to that m > 81og(A/eo) which is 
satisfied under the condition (fT0|) . Now put m — 8a\og(N/£o) with a > 1. Then under the condition (| 1 [) . 
we have 

2 / 



m 2 [y/m- v/81og(A/£ ) 



m 
~2 



ilog 



N 
£0 



ilog 



N 
£0 



= 4m 2 (va - l) log — 

£0 

3 



(58) 



> 16 2 



1 N 
log — 

£0 



1 



- 1 



1 N 
= log — 

£0 



161og(A/£ ) 

therefore the right-hand side of (|57|) is smaller than £q/N. Hence the proof of Proposition [6] is concluded. 



6.7 Proof of Proposition [7] 

Let 1 € U be the unique pirate. Then by Marking Assumption and the choice p = 1/2, we have y' = W\ 
and 5(1) = mlog2, while Z < Zq = {m/2) log 2 + y/(m/2) log^iV/io) log 2. Now by the assumption m > 
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21og(JV/e ), we have 



therefore 5(1) > Z > Z. Hence the proof of Proposition [7] is concluded. 



7 Conclusion 

In this article, we proposed a new construction of probabilistic 3-secure codes and presented a theoretical 
evaluation of their error probabilities. A characteristic of our tracing algorithm is to make use of both 
score comparison and search of the triples of "parents" for a given pirated fingerprint word. Some numerical 
examples showed that code lengths of our proposed codes are significantly shorter than the previous provably 
secure 3-secure codes. Moreover, for the sake of improving efficiency of our tracing algorithm, we also 
proposed an implementation method for the algorithm, which seems indeed more efficient for an average 
case than the naive implementation. A detailed evaluation of the proposed implementation method will be 
a future research topic. 
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